Equilibrium correlations in a model for multidimensional epistasis 
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We investigate a statistical model for multidimensional epistasis. The genotype is devided into 
subsequences, and within each subsequence mutations which occur in a prescribed order are benefi- 
cial. The bit-string model used to represent the genotype, may be cast in the form of a ferromagnetic 
Ising model with a staggered field. We obtain the actual correlations between mutations at different 
sites, within an equilibrium population at a given tolerance, which we define to be the temperature 
of the statistical ensemble. 
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I. INTRODUCTION 

Although evolution takes place via a combination of 
random mutations and natural selection, it seems to pro- 
ceed rather rapidly along directed paths in the space of 
all possible genetic states. It is a challenging problem 
to try to understand the mechanisms which lead to this 
phenomenon 

Eigen has pointed out that each "species" actually con- 
sists of a more or less narrow distribution in the phase 
space of all possible genetic states, and this distribution 
may shift, in response to environmental pressure Q. Nat- 
ural selection in response to environmental factors is usu- 
ally modelled in terms of a "fitness function" which is a 
measure of the survival probability and/or reproductive 
capability of the individual. 

Those mutations which have a salutary effect on the fit- 
ness persist in the population and lead to new variants; 
other, neutral mutations may simply be carried along 
since they do not affect the well being of the individ- 
ual. Deleterious mutations usually affect the organism 
adversely, and the accumulation of too many will reduce 
the fitness drastically. 

The simplest hypothesis biologists have adopted re- 
garding how the number of mutations affect the fitness, 
is that each deleterious mutation reduces the fitness by 
an identical factor, say 1/a, a > 1. This is equivalent to 
assuming that the effect of each deleterious mutation is 
independent of the others, or that there is no "epistasis" 
between the mutations, and leads to a fitness function 
which decays exponentially with to, the number of mu- 
tations, as / ~ exp(— am), where a = In a. || A different 
type of assumption can be made, to take / to depend on 
to in a step- wise fashion, so that the value of / is un- 
affected for m less than a threshold, after which it is 
reduced drastically. Q] 

It is clear, however, that there can be epistatic inter- 
actions between mutations at different points on the ge- 
netic string and that the expression of unmutated genes 
may be affected by the presence of mutations at certain 
loci, and so on. S Therefore / may depend not only on 
the total number of mutations, but also on their loca- 
tion, and may also increase as the result of mutations at 
certain loci. It has recently been pointed out that the 



fitness may depend strongly on the order in which cer- 
tain mutations may occur As a case in point, for a 
mutation leading to a certain modification to be bene- 
ficial, one must already have had a mutation leading to 
the emergence of a feature which will benefit from this 
modification. 

This type of epistasis actually lends itself to a treat- 
ment in terms of statistical equilibria, with the appropri- 
ate choice of a fitness function. || 

In this paper we will represent a complete genomic se- 
quence with epistatic interactions by a one dimensional 
feromagnetic Ising model. We will subdivide the total 
genotype into subsequences (here taken to be of length 
2, without any loss of generality), and stipulate that mu- 
tations can lead to salutory effects only if they occur 
in a certain order within these subsequences. We will 
further introduce a new quantity, the "tolerance" of the 
environment, which will have to be taken into acount to 
determine how strongly epistatis interactions affect the 
overall fitness. Our aim will be to compute, within this 
model, the effective correlations between mutations at 
different sites, at fixed tolerance, within a population at 
equilibrium. 



II. THE MODEL 

Since Eigen first introduced the quasi-species model Q] 
bitstring models of genetic evolution have been exten- 
sively studied numerically |], O, In this ap- 
proach, the genotype of an individual is represented by 
a string of Boolean variables Ui, i — 1, . . . N, which can 
obviously be identified with a one dimensional system of 
Ising spins ||. If one takes the wild type, or the initial 
genotype, to consist of a string of O's, each point muta- 
tion is indicated by flipping the bit representing a given 
gene, from to 1. 

We would like to avail ourselves of the analytically 
known results on the exactly solvable Ising model in equi- 
librium, to be able to make definite predictions regarding 
the correlation of mutated genes on a given genotype, 
under assumptions similar to those of Kondrashov and 
Kondrashov gj. 

We devide the one dimensional string of spins repre- 
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senting the state of the genome, into dimers. We demand 
that the fitness is only increased relative to the wild type 
(all zeroes) if the bits that flip to 1 occur sequentially. 
M. Thus, within each dimer, (0,0), (1,0), (1,1) are in 
increasing order of fitness while (0, 1) is less fit than (0, 0). 

Let us first construct a cost function by defining the 
Ising Hamiltonian, 

i iodd ieven 

where for greater convenince in manipulation, we have 
defined the variables Si — — 1/2). The value of Tt 
for each given sequence of {si} will serve as a cost func- 
tion, in terms of which we may define the fitness. No- 
tice that in the first term, we have a coupling between 
nearest neighbors, which tends to reduce the "cost" for 
those configurations in which the adjacent "spins" are in 
the same state. If the constants, K and H, which corre- 
spond to a staggered external field in an Ising model, are 
here chosen as K = 3 J/4 and H = — J/4, then we ob- 
tain a situation in which the dimer configurations (—1,1), 
(— 1, —1), (1, —1 and (1, 1) have decreasing cost. 
Then / is defined as 

/ = \^ (2) 

where (3 is a measure of how effective the cost func- 
tion is in affecting the fitness, and Z is a normalization 
factor so that / £ (0, 1). Note that /[{si}] can be iden- 
tified as the Boltzmann factor in an equilibrium statisti- 
cal model with the Hamiltonian H, at constant inverse 
"temperature" /3 , and corresponds to the probablity 
of observing, within an equilibrium population, the par- 
ticular genotype {si}. Temperature may be seen as the 
amount of randomness, or disorder in the system, com- 
peting with the cost function in determining the fitness. 
The higher the temperature, or randomness, the weaker 
will be the effect of the cost function in determining the 
state of the system. Therefore we define 

T = /T 1 (3) 

as the tolerance in the system. Here J is a measure of the 
strength of the interaction between the states of each of 
the sites (alleles), o~i. Clearly, (3 and J will always occur 
together in this model, in the product /3J, and we may 
simply absorb J into the definition of (3. 

The fitness / is normalized to take values between 
(0, 1), by defining 

Using the transfer matrix method, this sum may be com- 
puted exactly. We may then compute the expectation 
values m,i = (si). Note that the quantity (m, + l)/2 
corresponds to the probability of finding a mutation on 
either of the sublattices, i odd, or i even. The results 



are shown in Fig. 1, as a function of T/J, which is the 
(inverse) ration of the strength of the epistasis to the tol- 
erance in the system. The "staggered magnetization," 
m s = (s.j(odd) — Si (even)) is shown in Fig. 2, and is twice 
the difference between the probabilities of encountering 
a mutation on either of the two sublattices (the first or 
the second sites beloging to a dimer). It is seen to peak 
sharply at small values of the tolerance, and then the 
difference decays to zero, as the tolerance becomes very 
large, at which point the fitness function becomes essen- 
tially flat. 

In Fig. 3a, b and c, we display the correlation func- 
tions, C*2 = (siSi+2) and C\ = (sjSj+i), as well as the 
subtracted correlation function C s (siSi+2) — (si}(sj+2), 
as a function of T/J. It can clearly be seen here as well, 
that the effect of epistatic interactions in building up cor- 
relations between mutated sites on the gene string, is felt 
strongly within a given range of tolerances, in units of 
the strength of interaction. At T = 0, since the genes are 
in the ordered state with all s$ = 1, the excess correlation 
C s due to the interactions, is nil. In the other extreme 
of very large tolerances, the system is completely disor- 
dered, correlations vanish, so that the two terms in C s 
tend to each other, and both tend to zero. 

To further elucidate the meaning of tolerance, we may 
compute the relative variances v m , where 

v 2 m = {{ Sl ~ mi ) 2 ) . (5) 

It is easy to see, that within a mean field approximation, 
where all the spins interact pairwise with each other, i.e., 

H = -J/A^Efe) S i S J> V ni = T / 2J 5 thus thc ratio T / J is 
a measure of the size of the fluctuations about the mean. 
In genome space, this means y/T/J is a measure of the 
radius of the distribution of genotypes about the most 
frequently encountred one, in equilibrium. 

III. CONCLUSIONS 

In summary, we have cast an epistatic quasispecies 
model interms of a one dimensional Ising model with 
staggered magnetic field, to give greater advantage to 
certain subsequences of genes that may be mutated. We 
defined a "tolerance" of the system, to introduce an equi- 
librium statistical ensemble, namely one whose statistical 
properties do not change in time. Correlations induced 
on the genetic sequence of individuals in this equlibrium 
population have been computed as a function of the tol- 
erance and the strength of the epistatic interaction, using 
exact solutions of the Ising model in one dimension. It 
has been shown that non-trivial correlations between mu- 
tated sites on the gene string may arise only in a finite 
range of the tolerance for a given interaction strength. 
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Figure captions 

1. The magnetization at a) odd, b) even sites, of the 
one dimensional Ising model on these respective sublat- 
tices, as a function of the "tolerance." The probability of 
encountering mutations at these respective sites is given 
by (mi + l)/2. 

2. The "staggered magnetization" is twice the differ- 
ence between the probabilities of encountereing mutated 
genes at the first or the second site of the dimers into 
which the genome has been decomposed. 

3. The correlations function between mutated sites 
on a) analogous sites on neighboring dimers, b) odd-even 
sites c) the subtracted correlation function between anal- 
ogous sites. 
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